专利摘要:
The present invention discloses a crop pest detection method based on Feature Fusion Single Shot Multibox Detector Inception V3 (F-SSD-IV3), including the following steps: (1) capturing pest images to construct a crop pest database; (2) constructing an F-SSD-IV3 target detection algorithm, using lnception V3 to replace VGG-16 as a feature extractor, designing a feature fusion method to conduct fusion on context information for output feature maps of different scales, and finally fine-tuning a candidate bound by using Softer NMS; and (3) optimizing a network during training, and improving detection performance and a model generalization capability by using a method of amplifying data and adding a Dropout layer.
公开号:NL2025689A
申请号:NL2025689
申请日:2020-05-27
公开日:2020-12-03
发明作者:He Yong;Zeng Hong;Wu Jianjian;Xu Jian
申请人:Univ Zhejiang;
IPC主号:
专利说明:

P3457ONLOO/TRE Title: CROP PEST DETECTION METHOD BASED ON F-SSD-IV3
TECHNICAL FIELD The present invention belongs to the field of deep learning and computer vision, and in particular, to a crop pest detection method based on Feature Fusion Single Shot Multibox Detector Inception V3 (F-SSD-1V3).
BACKGROUND With the continuous growth of the global population, a grain demand is also increasing dramatically. Due to the influence of the natural environment and factors of crops, the crops are inevitably attacked by pests at different growth stages. If the pests cannot be detected and eliminated in time, an outbreak of pests may occur. A large-scale outbreak of pests will affect the healthy growth of crops, thereby greatly reducing a yield and quality of the crops.
Conventional pest identification is based on morphological features such as a morphology, a color, and a texture, and relies on an artificial identification method. As a result, there exists specific subjectivity, poor timeliness, or labor intensity. Early identification of pests is based on a template matching technology and a simple model, and a feature of a pest image is extracted by using an artificially designed feature. Common features include a histogram of oriented gradient (HOG), a local binary pattern (LBP), scale-invariant feature transform (SIFT), Haar-like, a deformable parts model (DPM), etc. However, the artificially designed feature depends on priori knowledge. Therefore, it is difficult to accurately express a color and a morphology of a target pest, and there is a lack of robustness. In addition, application scenarios of the above-mentioned methods have limitations and are only suitable for an ideal laboratory environment.
In recent years, relying on a powerful feature expression capability of a convolutional neural network (CNN), a target detection method based on deep learning has made great breakthroughs in detection performance. In general, the target detection method based on deep learning can be divided into two types: a target detection method based on a candidate region and a target detection method based on regression. In the target detection method based on a candidate region, an algorithm process includes generating a candidate region in an image, extracting a feature from the candidate region to generate a region of interest (Rol), and finally conducting classification and regression. Common algorithms include R-CNN[53], Fast R- CNN[54], Faster R-CNN[55], and R-FCN[56]. Such a method has relatively high accuracy but has a low detection speed. Currently, a main trend of object detection is faster and more efficient detection. Target detection methods based on regression such as YOLO [59] and SSD [60] have an obvious advantage of a high detection speed. For an input image, a bounding box and a
-2- category thereof are predicted at multiple positions of the image at the same time when there is no candidate region. A limitation of YOLO lies in a strong spatial constraint on the prediction of a bounding box. Therefore, it is difficult to detect a multiple-scale small target object. In terms of a detection speed, SSD can basically achieve real-time performance, but it has relatively poor detection performance when being used for a small target object. In an actual field environment, there is a complex background and diverse pest types/postures, and a target size in an obtained pest image is relatively small. Consequently, existing detection methods cannot well satisfy a need of the crop pest detection field.
SUMMARY To resolve a problem that existing detection methods cannot well balance a contradiction between a detection speed and detection accuracy, based on characteristics of existing pest images: a small number of samples, small target objects, diverse posture changes, and being easy to be blocked, the present invention proposes a new F-SSD-IV3 target detection method for crop pest detection, to improve an SSD target detection algorithm.
To achieve the foregoing objective, the present invention provides the following technical solution, including the following steps, as shown in FIG. 1: (1) Capture pest images through internet downloading, smartphone shooting, digital camera shooting, etc. to construct a crop pest database.
(1-1) Set all the RGB pest images to images in a JPEG format, and name the images with pest names and continuous numbers.
(1-2) Label a category of a pest and a rectangular boundary box in the image by using an image annotation tool Labellmg, where the rectangular boundary box is formed by four pieces of coordinate information: xmin, ymin, xmax, and ymax.
(2) Construct an F-SSD-IV3 target detection algorithm, use Inception V3 to replace VGG- 16 as a feature extractor, design a feature fusion method to conduct fusion on context information for output feature maps of different scales, and finally fine-tune a candidate bound by using Softer NMS, where the method is shown in FIG. 2, and a detailed process includes the following: (2-1) Select the Inception V3 as a basic network of the F-SSD-IV3, where a structure of an Inception V3 network is shown in FIG. 3, and includes a convolutional layer, a convolutional layer, a convolutional layer, a pooling layer, a convolutional layer, a convolutional layer, a pooling layer, Mixed1_a, Mixed1_b, Mixed1_c, Mixed2_a, Mixed2_b, Mixed2_c, Mixed2_d, Mixed2_e, Mixed3_a, Mixed3_b, Mixed3_c, a pooling layer, a dropout layer, and a fully connected layer; a size of an input image is 300x300x3; dimensions of convolution kernels include 1x1, 1x3, 3x1, 3x3, 5x5, 1x7, and 7x1; the pooling layer includes maximum pooling and average pooling, and has a dimension of 3x3; and sizes of the obtained feature maps are 149x149x32, 147x147x32,
-3- 147x147 x64, 73x73xB4, 73x73x80, 71x71x192, 35x35x192, 35x35x256, 35x35x288, 35x35x288, 17x17x768, 17x17x768, 17x17x768, 17x17x768, 17x17x768, 8x8x1280, 8x8x2048, 8x8x2048, and 1x1x2048.
(2-2) Then add an additional network of the six convolutional layers after the Inception V3, where sizes of convolution kernels are respectively 1x1x258, 3x3x512 (a step is 2), 1x1x128, 3x3x256 (a step is 2), 1x1x256, and 3x3x128 (a step is 1); and obtain three feature maps with sizes gradually decreased, where sizes thereof are respectively 4x4x512, 2x2x256, and 1x1x128.
(2-3) Conduct feature fusion on the feature maps output in step 2-2, a Mixed1_c feature map, a Mixed2_e feature map, and a Mixed3_c feature map, to resolve a problem that it is difficult to detect a small target object in a later stage of an original SSD target detection method due to a serious lack of global context information, where the feature fusion method is shown in FIG. 3, and specifically includes first conducting deconvolution on a feature map at a next layer, then conducting feature fusion on the feature map at the next layer and a feature map at a current layer in a cascading manner, and outputting a new feature map; and an output candidate bound in the network structure can be represented as the following formula: Output candidate bound ={F, , (ft). ee ( A) Je = Joa = ht 1 Jo = ht ott fi n>k>0 / represents a feature map output at a cascaded n! layer, and P represents a candidate bound generated for each feature map.
"+" in FIG. 4 represents a cascading module formed by a deconvolution layer, 3x3 convolution layers, and a 1x1 convolution layer, and can transfer an advanced feature to a lower layer. To combine feature maps of different sizes, the cascading module uses the deconvolution layer to generate and input feature maps with a same height and width; then uses two 3x3 convolution layers to better learn features; and uses a standardized layer before connection to conduct normalization processing on the input feature maps. Normalization can resolve a problem of gradient explosion, and can greatly increase a training speed during network training. Concat can combine two feature maps. Other dimensions of the two feature maps are same except a stitching dimension. The 1x1 convolutional layer is introduced for dimensionality reduction and feature recombination.
(2-4) Conduct convolution on k candidate bounds at each position in a mxn feature map, where a size of a convolution kernel is (c+ Dl predict c category scores and four position
-4- changes, and finally generate ™ X10 Xk(¢ +4) predicted outputs. For the candidate bound of the feature map, a minimum scale is Sin =0.2, and a maximum scale is S as =0.9. In the present invention, S in =0.1, and Sas =0.95. In this case, a size range of the candidate bound of the Se +S feature map is larger. To ensure smooth scale transition between layers, a hew scale 2 is added for a feature map at each layer in the present invention, so as to improve the detection a, €{1,2,3~,5} accuracy. In addition, a default aspect ratio of a candidate bound is set to 23. When 4 =1, an extra candidate bound is added, and a size thereof is Sk = NSS (2-5) During detection conducted by using an original SSD algorithm, use the NMS to preserve a candidate bound with a relatively high confidence coefficient, and generate a large number of candidate bounds (24,564 candidate bounds are generated by using SSD512) between which an overlap exists; (1) A candidate bound is selected by using the Soft NMS for each candidate bound. (2) For each selected candidate bound M, whether an loU of another candidate bound and the candidate bound M is greater than a threshold p is determined. (3) Weighted averaging is conducted on all candidate bounds whose loUs are greater than the threshold p, and position coordinates of the candidate bounds are updated. (2-8) A loss function of the SSD is formed by two parts: a position loss Line and a classification loss Loong and can be represented as follows: 1 L(x,c,l,g)= 7 Leos (x.c)+al,. (x1,8)) where N represents the number of candidate bounds matching a real boundary, cis a confidence coefficient of each type of candidate bound, /is a value of a translation and scale change of a candidate bound, g is position information of the real boundary, and a=1 by default. (3) Optimize a network during training, and improve detection performance and a model generalization capability by using a method of amplifying data and adding a Dropout layer. (3-1) A data set of pests is relatively small, it is relatively difficult to obtain new data, and relatively high costs are required to obtain a data set with sufficient labels. Therefore, a data amplification method is adopted in the present invention to expand a data set. Data amplification can be represented as the following formula:
DST S represents raw training data, 7’ represents data obtained after data amplification, and Ö is the adopted data amplification method.
-5.
In the present invention, a common data amplification manner is adopted to randomly adjust luminance, a contrast ratio, and saturability of an image and conduct flipping, rotation, cropping, and translation on the image. Finally, the training set is expanded by five times.
(3-2) The Dropout policy can prevent a problem of model overfitting. During network training, some neurons at a hidden layer are randomly inhibited in each iteration at a probability p, and finally a comprehensive averaging policy is used to combine different neural networks as a final output model. In the present invention, probabilities of randomly inhibiting some neurons at the hidden layer are p=0.5, 0.6, 0.7, 0.8, 0.9.
BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a step diagram of a detection method according to the present invention; FIG. 2 is a flowchart of an F-SSD-IV3 algorithm; FIG. 3 is a network structure diagram of Inception V3; and FIG. 4 is a schematic diagram of a feature fusion method.
DETAILED DESCRIPTION The present invention is described in detail below with reference to embodiments and the accompanying drawings, but the present invention is not limited thereto.
(1) Experimental data: In the present invention, a field crop typical-pest data set collected by the Institute of Agricultural Information Technology, Zhejiang University is adopted, and pest images in the data set include information such as different image sizes, light conditions, blocking degrees, shooting angles, and target pest sizes. Images in a database are randomly and evenly distributed in a training set, a verification set, and a test set at a ratio of 7:2:1. A model is trained by using data in the training set, evaluation is conducted by using the validation set to select a model parameter, and finally model performance and efficiency are detected by using the test set.
(2) Experimental environment: Specifications of an experimental workstation are as follows: Memory is 32GB, an operating system is Linux Ubuntu 18.04, and a CPU is Intel Core i7 7800X. TensorFlow supports multi-GPU training. A total of two NVIDIA GeForce GTX 1080Ti graphics cards are used for training in the present invention. Python is used as a programming language because it can support a TensorFlow deep learning framework.
(3) A training process: First, data amplification is conducted to expand the training set, and a size of an input image is fixed at 300x300x3. Then a network is initialized, errors of a position loss function and a classification loss function are calculated through forward propagation, and parameters are updated through backpropagation until 200,000 iterations are completed, and finally the parameters are saved. In the experiment, a model Inception V3 trained on ImageNet is used as a feature extraction network of SSD through fine-tuning, and parameters of the
-6- Inception V3 are used to initialize parameters of a basic network to speed up a training speed of the network. Training hyperparameters are as follows: a random number of standard normal distribution with a standard deviation of 0.1 and a mean of 0 is generated through initialization. A stochastic gradient descent (SGD) method of Momentum is used, a weight is 0.9, and an attenuation coefficient is also set to 0.9. Compared with SGD, a Momentum optimizer resolves two problems: noise introduction and relatively large convergence oscillation. An initial learning rate is set to 0.004, an exponential attenuation parameter is set to 0.95, and a batch size is set to
24. A total of 200,000 iterations are conducted, and one complete training operation is conducted for approximately 20 hours. During training, when an loU of a candidate bound and a labeled rectangular box exceeds 0.6, the candidate bound is a positive sample; otherwise, the candidate bound is a negative sample.
(4) Parameters of the model are continually adjusted according to a result of the verification set, and the test set is applied to a trained optimal model to determine the performance of the model. When p of a Dropout layer is 0.8, an mAP value is the highest. An F- SSD-1V3 algorithm proposed in the present invention is compared with original SSD300, Faster R-CNN, and R-FCN target detection algorithms based on a same test set, and a target detection standard performance evaluation indicator mAP proposed in the Pascal VOC Challenge is used as a performance indicator.
Table 1 Performance comparison of various algorithms F-SSD- Detecti It can be learned from the foregoing table that, SSD300 has the best detection speed, namely, 0.048 seconds per single image, but the detection accuracy is the lowest; the detection accuracy of both Faster R-CNN and R-FCN is lower than 0.68, and Faster R-CNN and R-FCN can detect a single image by approximately 0.15 seconds. Compared with R-FCN and Faster R- CNN, F-SSD-IV3 has relatively larger advantages in the detection accuracy and a detection speed. Therefore, F-SSD-IV3 proposed in the present invention can better balance the detection accuracy and the detection speed. The present invention has a relatively high practical value for real-time and accurate detection of pests in a field environment.
The foregoing descriptions are merely preferred examples of the present invention, but are not intended to limit the present invention. Any modifications, equivalent replacements or
-7- improvements made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.
权利要求:
Claims (1)
[1]
-8-
CONCLUSIONS
1. A crop pest detection method based on Feature Fusion Single Shot Multibox Detector Inception V3 (F-SSD-IV3), which includes the following steps: (1) capturing images of the pest to compile a crop pest database; (2) constructing an F-SSD-IV3 target detection algorithm, running feature maps at different scales using the images in the crop pest database and using Inception V3 as a feature parser, performing feature fusion on the feature maps, and refining a candidate boundary using Softer NMS; and (3) optimizing a target detection network by enhancing data and adding a dropout layer to obtain an optimal detection model used to detect crop pests in an image.
A crop pest detection method based on F-SSD-IV3 according to claim 1, wherein the crop pest database stores pest images with different dimensions, lighting conditions, blocking degrees, recording angles and target pest sizes.
The F-SSD-IV3-based crop pest detection method according to claim 1, wherein step (2) specifically includes the following steps: (2-1) selecting Inception V3 as a base network of the F-SSD-IV3, wherein a structure of an Inception V3 network includes a convolution layer, a convolution layer, a convolution layer, a pooling layer, a convolution layer, a convolution layer, a pooling layer, Mixed1_a, Mixed1_b, Mixed1_c, Mixed2_a, Mixed2_b, Mixed2_c, Mixed2_d, Mixed2_e, Mixedb, Mixed_a3, Mixed_c3, comprises a pooling layer, a drop-out layer and a fully bonded layer; wherein dimensions of convolution kernels include 1x1, 1x3, 3x1, 3x3, 5x5, 1x7 and 7x1; wherein the pooling layer comprises maximum pooling and average pooling, and has a size of 3x3; and where sizes of the attribute maps obtained are 149x149x32, 147x147x32, 147x147x64, 73x73x64, 73x73x80, 71x71x192, 35x35x192, 35x35x256, 35x35x288, 35x35x288, 17x17x768, 17x17x768, 17x17x180, 17x17x180, 17x17x180, 17x17x768, 17x17x130 (2-2) then adding an additional network of the six convolution layers after the Inception V3, where sizes of the convolution kernels are 1x1x256, 3x3x512, 1x1x128, 3x3x256, 1x1x256 and 3x3x128, respectively, and obtain three attribute maps with sizes that are gradual decreasing, the sizes thereof being 4x4x512, 2x2x256 and 1x1x128 respectively;
-9- (2-3) performing feature fusion on the feature maps performed in step 2-2, a Mixed1_c feature map, a Mixed2_e feature map, and a Mixed3_c feature map, and outputting a new feature map; (2-4) running convolution on k candidate boundaries bound at any position in an mxn feature map, where a size of a convolution kernel is (c + 4) k, predicting c category scores and four position changes, and finally ™ X10 xk (c +4) generates predicted output values; (2-5) using the NMS to maintain a candidate boundary with a relatively high security coefficient, and generating a large number of candidate boundaries that overlap; selecting a candidate boundary using the Soft NMS for each candidate boundary; determining, for each selected candidate boundary M, whether a loU of another candidate boundary and the candidate boundary M is greater than a threshold p; and performing weighted averaging on all candidate boundaries whose loUs are greater than the threshold p, and updating position coordinates of the candidate boundaries; and (2-6) where a loss function of the SSD is formed by two parts: a position loss Lie and a classification loss Loong ‚and can be represented as follows: 1 L (x, ¢, l, g) = To (x, c ) + al, (x, .. 8)) where N represents the number of candidate boundaries corresponding to an actual boundary, cc is a certainty coefficient of each candidate boundary type, / is a value of a displacement and scaling of a candidate boundary, g position- information is of the true boundary, and defaults a = 1.
A crop pest detection method based on F-SSD-IV3 according to claim 3, wherein in step (2-3) the feature fusion method comprises first performing deconvolution on a feature map on a next layer, then performing feature fusion on the feature map on the next layer. layer and includes a feature map on a current layer in a staggered manner, and executing a new feature map.
A crop pest detection method based on F-SSD-IV3 according to claim 3, wherein an output value for the candidate boundary in the network structure can be represented as the following formula:
-10 - Output candidate bound = {P, _, (£ 4): 4 (4) zh fon = fit fh Jin = Jy + Ju + ... + Jur n> k => 0 where 7 is a feature map output represents a tiered n® layer, and P represents a candidate boundary generated for each feature map.
A crop pest detection method based on F-SSD-IV3 according to claim 5, wherein an 11 to € {1,2,3, -, -} standard aspect ratio of a candidate boundary is set to 23 and when “r = 1, an additional candidate boundary is added, and a size thereof being St NSS jg
A crop pest detection method based on F-SSD-IV3 according to claim 1, wherein data gain in step (3) is represented as the following formula:
OST where S represents raw training data, / represents data obtained after data amplification, and J is an assumed data amplification method; and brightness, contrast ratio, and saturation of an image are arbitrarily adjusted, and mirroring, rotation, cropping and displacement are applied to the image.
A crop pest detection method based on F-SSD-IV3 according to claim 1, wherein the drop-out policy is as follows: that during network training, some neurons on a hidden layer are randomly suppressed with a probability p at each iteration, and that ultimately a comprehensive averaging policy is used to combine different neural networks as a final output model.
类似技术:
公开号 | 公开日 | 专利标题
NL2025689B1|2021-06-07|Crop pest detection method based on f-ssd-iv3
Xiong et al.2017|Panicle-SEG: a robust image segmentation method for rice panicles in the field based on deep learning and superpixel optimization
Barbedo2018|Factors influencing the use of deep learning for plant disease recognition
Waheed et al.2020|An optimized dense convolutional neural network model for disease recognition and classification in corn leaf
WO2020177432A1|2020-09-10|Multi-tag object detection method and system based on target detection network, and apparatuses
Saedi et al.2020|A deep neural network approach towards real-time on-branch fruit recognition for precision horticulture
Liu et al.2020|A novel and high precision tomato maturity recognition algorithm based on multi-level deep residual network
Mohammed Abdelkader et al.2021|Hybrid Elman neural network and an invasive weed optimization method for bridge defect recognition
Liang et al.2021|Comparison detector for cervical cell/clumps detection in the limited data scenario
Hao et al.2020|Growing period classification of Gynura bicolor DC using GL-CNN
Liu et al.2020|Disease spots identification of potato leaves in hyperspectral based on locally adaptive 1d-cnn
Ubbens et al.2020|Autocount: Unsupervised segmentation and counting of organs in field images
Wang et al.2021|Tomato anomalies detection in greenhouse scenarios based on YOLO-dense
CN112308825A|2021-02-02|SqueezeNet-based crop leaf disease identification method
He et al.2019|Spatial attention network for few-shot learning
Zhang et al.2021|Joint information fusion and multi-scale network model for pedestrian detection
Ju et al.2021|Classification of jujube defects in small data sets based on transfer learning
Song et al.2020|Multi-source remote sensing image classification based on two-channel densely connected convolutional networks.
Jia et al.2020|Dough-stage maize | ear recognition based on multiscale hierarchical features and multifeature fusion
Chu et al.2014|Automatic image annotation combining svms and knn algorithm
Jia et al.2021|FoveaMask: A fast and accurate deep learning model for green fruit instance segmentation
CN109308936B|2020-09-01|Grain crop production area identification method, grain crop production area identification device and terminal identification equipment
Manga2021|Plant Disease Classification using Residual Networks with MATLAB
Xin et al.2019|A Multi-scale Network based on Attention Mechanism for Hyperspectral Image Classification
Mitsakos et al.2019|Virtual Multimodal Automated Object Detection with Deep Neural Networks
同族专利:
公开号 | 公开日
CN110222215B|2021-05-04|
CN110222215A|2019-09-10|
NL2025689B1|2021-06-07|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题
CN109191455A|2018-09-18|2019-01-11|西京学院|A kind of field crop pest and disease disasters detection method based on SSD convolutional network|
US7496228B2|2003-06-13|2009-02-24|Landwehr Val R|Method and system for detecting and classifying objects in images, such as insects and other arthropods|
US7286056B2|2005-03-22|2007-10-23|Lawrence Kates|System and method for pest detection|
CN107665355B|2017-09-27|2020-09-29|重庆邮电大学|Agricultural pest detection method based on regional convolutional neural network|
CN108399380A|2018-02-12|2018-08-14|北京工业大学|A kind of video actions detection method based on Three dimensional convolution and Faster RCNN|
CN109002755B|2018-06-04|2020-09-01|西北大学|Age estimation model construction method and estimation method based on face image|
CN109101994B|2018-07-05|2021-08-20|北京致远慧图科技有限公司|Fundus image screening method and device, electronic equipment and storage medium|
CN109740463A|2018-12-21|2019-05-10|沈阳建筑大学|A kind of object detection method under vehicle environment|CN110782435A|2019-10-17|2020-02-11|浙江中烟工业有限责任公司|Tobacco worm detection method based on deep learning model|
CN112464971A|2020-04-09|2021-03-09|丰疆智能软件科技有限公司|Method for constructing pest detection model|
CN113065473A|2021-04-07|2021-07-02|浙江天铂云科光电股份有限公司|Mask face detection and body temperature measurement method suitable for embedded system|
法律状态:
优先权:
申请号 | 申请日 | 专利标题
CN201910470899.6A|CN110222215B|2019-05-31|2019-05-31|Crop pest detection method based on F-SSD-IV3|
[返回顶部]